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1 Paper session DB-8 (databases): query optimisation: Towards estimating the 
number of distinct value combinations for a set of attributes 
Xiaohui Yu, Calisto Zuzarte, Kenneth C. Sevcik 

October 2005 Proceedings of the 14th ACM international conference on 
Information and knowledge management CIKM '05 

Publisher: ACM Press 
Full text available: ^pdf(269.74 
KB) 



Additional Information: full citation , abstract , references , index terms 



Accurately and efficiently estimating the number of distinct values for some 
attribute(s) or sets of attributes in a data set is of critical importance to many 
database operations, such as query optimization and approximation query 
answering. Previous work has focused on the estimation of the number of distinct 
values for a single attribute and most existing work adopts a data sampling 
approach. This paper addresses the equally important issue of estimating the 
number of distinct value combinati ... 



Keywords: cardinality estimation, relational database 



2 Integration of spatial join algorithms for processing multiple inputs 
Nikos Mamoulis, Dimitris Papadias 

June 1999 ACM SIGMOD Record , Proceedings of the 1999 ACM SIGMOD 

international conference on Management of data SIGMOD '99, Volume 
28 Issue 2 
Publisher: ACM Press 

i- ii * , i ui A a cc iv/ic^ Additional Information: full citation , abstract , references , citings , 
Full text available: 1 Tlpdf(1.66 MB) 

index terms 

Several techniques that compute the join between two spatial datasets have been 
proposed during the last decade. Among these methods, some consider existing 
indices for the joined inputs, while others treat datasets with no index, providing 
solutions for the case where at least one input comes as an intermediate result of 
another database operator. In this paper we analyze previous work on spatial joins 
and propose a novel algorithm, called slot index spatial join (SISJ), t ... 

Keywords: query optimization, spatial joins, spatial query processing 



Research sessions: statistics: A bi-level Bernoulli scheme for database sampling 
Peter J. Haas, Christian Konig 



1 of 6 



10/27/2006 7:44 PM 



Results (page 1): +estimate +cardinality + M data skew" +query +optim... http://portal.acm.org/resultsxfm?CFID=3246953&CFTOKEN=954.. 



June 2004 Proceedings of the 2004 ACM SIGMOD international conference on 
Management of data 

Publisher: ACM Press 

Full text available: ^ pdf(344.80 Additiona| , nforniat j on: ful | citation , abstract , references 
KB) 

Current database sampling methods give the user insufficient control when 
processing ISO-style sampling queries. To address this problem, we provide a 
bi-level Bernoulli sampling scheme that combines the row-level and page-level 
sampling methods currently used in most commercial systems. By adjusting the 
parameters of the method, the user can systematically trade off processing speed 
and statistical precision— the appropriate choice of parameter settings becomes a 
query optimization problem. We ... 

Fast algorithms for universal quantification in large databases 
Goetz Graefe, Richard L. Cole 

June 1995 ACM Transactions on Database Systems (TODS), Volume 20 Issue 2 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings . 



Full text available: ^jj] pdf(3.51 MB) 



index terms, review 



Universal quantification is not supported directly in most database systems despite 
the fact that it adds significant power to a system's query processing and inference 
capabilities, in particular for the analysis of many-to-many relationships and of 
set-valued attributes. One of the main reasons for this omission has been that 
universal quantification algorithms and their performance have not been explored for 
large databases. In this article, we describe and compare three known algorithms ... 

Space efficient bitmap indexing 
Nick Koudas 

November 2000 Proceedings of the ninth international conference on 
Information and knowledge management 

Publisher: ACM Press 

Full text available: pdf(268.42 Additiona | information: full citation , references , citings , index terms 
KB) 



Advanced SQL modeling in RDBMS 

Andrew Witkowski, Srikanth Bellamkonda, Tolga Bozkaya, Nathan Folkert, Abhinav 
Gupta, John Haydu, Lei Sheng, Sankar Subramanian 

March 2005 ACM Transactions on Database Systems (TODS), Volume 30 Issue 1 
Publisher: ACM Press 

Full text available: pdf(279.06 Additjona | information: full citation , abstract , references , index terms 
KB) 

Commercial relational database systems lack support for complex business modeling. 
ANSI SQL cannot treat relations as multidimensional arrays and define multiple, 
interrelated formulas over them, operations which are needed for business modeling. 
Relational OLAP (ROLAP) applications have to perform such tasks using joins, SQL 
Window Functions, complex CASE expressions, and the GROUP BY operator 
simulating the pivot operation. The designated place in SQL for calculations is the 
SELECT clause, whi ... 

Keywords: Excel, OLAP, analytic computations, spreadsheet 
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Paolo Ciaccia, Marco Patella 

December 2002 ACM Transactions on Database Systems (TODS), Volume 27 Issue 4 
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Novel database applications, such as multimedia, data mining, e-commerce, and 
many others, make intensive use of similarity queries in order to retrieve the objects 
that better fit a user request. Since the effectiveness of such queries improves when 
the user is allowed to personalize the similarity criterion according to which database 
objects are evaluated and ranked, the development of access methods able to 
efficiently support user-defined similarity queries becomes a basic requirement. In t 



Keywords: Distance metrics, user-defined queries 



8 XML indexing and compression: Efficient processing of joins on set-valued 
attributes 
Nikos Mamoulis 

June 2003 Proceedings of the 2003 ACM SIGMOD international conference on 
Management of data 

Publisher: ACM Press 

Full text available: ^ |pdf(678.13 Additional Information: full citation , abstract , references , citings , 
KB) index terms 

Object-oriented and object-relational DBMS support set valued attributes, which are 
a natural and concise way to model complex information. However, there has been 
limited research to-date on the evaluation of query operators that apply on sets. In 
this paper we study the join of two relations on their set-valued attributes. Various 
join types are considered, namely the set containment, set equality, and set overlap 
joins. We show that the inverted file, a powerful index for selection queries, c ... 

Fast joins using join indices 
Zhe Li, Kenneth A. Ross 

April 1999 The VLDB Journal — The International Journal on Very Large Data 

Bases, Volume 8 Issue 1 
Publisher: Springe r-Verlag New York, Inc. 

Full text available: ^ ) pdf(263.06 Additiona , In formation: full citation , abstract , index terms 
KB) 

Two new algorithms, "Jive join" and "Slam join," are proposed for computing the join 
of two relations using a join index. The algorithms are duals: Jive join 
range-partitions input relation tuple ids and then processes each partition, while 
Slam join forms ordered runs of input relation tuple ids and then merges the results. 
Both algorithms make a single sequential pass through each input relation, in 
addition to one pass through the join index and two passes through a te ... 

Keywords: Decision support systems, Query processing 



10 Generalized multidimensional data mapping and query processing 
Rui Zhang, Panos Kalnis, Beng Chin Ooi, Kian-Lee Tan 

September 2005 ACM Transactions on Database Systems (TODS), Volume 30 Issue 
3 

Publisher: ACM Press 

Full text available: « g)pdf(689.08 Addjtiona | information: full citation , abstract , references , index terms 
KB) 

Multidimensional data points can be mapped to one-dimensional space to exploit 
single dimensional indexing structures such as the B &plus; -tree. In this article we 
present a Generalized structure for data Mapping and query Processing (GiMP), 
which supports extensible mapping methods and query processing. GiMP can be 
easily customized to behave like many competent indexing mechanisms for 
multi-dimensional indexing, such as the UB-Tree, the Pyramid technique, the 
iMinMax, and the iDistan ... 

Keywords: Indexing, data mapping, efficiency 
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11 Query evaluation techniques for large databases 
Goetz Graefe 

June 1993 ACM Computing Surveys (CSUR), Volume 25 Issue 2 
Publisher: ACM Press 

Full text available: pdf(9.37 MB) Additional Information: full citation , abstract, references , ci^as. 
^ index terms , review 

Database management systems will continue to manage large data volumes. Thus, 
efficient algorithms for accessing and manipulating large sets and sequences will be 
required to provide acceptable performance. The advent of object-oriented and 
extensible database systems will not solve this problem. On the contrary, modern 
data models exacerbate the problem: In order to manipulate large sets of complex 
objects as efficiently as today's database systems manipulate simple records, 
query-processi ... 

Keywords: complex query evaluation plans, dynamic query evaluation plans, 
extensible database systems, iterators, object-oriented database systems, operator 
model of parallelization, parallel algorithms, relational database systems, 
set-matching algorithms, sort-hash duality 



12 Research sessions: query progress: Estimating progress of execution for SQL 
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Surajit Chaudhuri, Vivek Narasayya, Ravishankar Ramamurthy 
June 2004 Proceedings of the 2004 ACM SIGMOD international conference on 

Management of data 
Publisher: ACM Press 

Full text available: ^pdf(201.38 Add|tjona , | nfor mation: full citation , abstract , references 
KB) 

Today's database systems provide little feedback to the user/DBA on how much of a 
SQL query's execution has been completed. For long running queries, such feedback 
can be very useful, for example, to help decide whether the query should be 
terminated or allowed to run to completion. Although the above requirement is easy 
to express, developing a robust indicator of progress for query execution is 
challenging. In this paper, we study the above problem and present techniques that 
can form the basi ... 

13 Research papers: OLAP: Efficient computation of multiple group by gueries 
Zhimin Chen, Vivek Narasayya 

June 2005 Proceedings of the 2005 ACM SIGMOD international conference on 
Management of data 

Publisher: ACM Press 

Full text available: T gpdf(371. 92 Additjona , information: full citation , abstract , references 
KB) 

Data analysts need to understand the quality of data in the warehouse. This is often 
done by issuing many Group By queries on the sets of columns of interest. Since the 
volume of data in these warehouses can be large, and tables in a data warehouse 
often contain many columns, this analysis typically requires executing a large 
number of Group By queries, which can be expensive. We show that the 
performance of today's database systems for such data analysis is inadequate. We 
also show that the pro ... 

14 Research sessions: compression: Wavelet synopses with error guarantees 
Minos Garofalakis, Phillip B. Gibbons 

June 2002 Proceedings of the 2002 ACM SIGMOD international conference on 
Management of data SIGMOD '02 

Publisher: ACM Press 
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Recent work has demonstrated the effectiveness of the wavelet decomposition in 
reducing large amounts of data to compact sets of wavelet coefficients (termed 
"wavelet synopses") that can be used to provide fast and reasonably accurate 
approximate answers to queries. A major criticism of such techniques is that unlike, 
for example, random sampling, conventional wavelet synopses do not provide 
informative error guarantees on the accuracy of individual approximate answers. In 
fact, as this paper de ... 

15 Processing queries for first-few answers 
Roberto J. Bayardo, Daniel P. Miranker 

November 1996 Proceedings of the fifth international conference on Information 
and knowledge management 

Publisher: ACM Press 

Full text available: pdf(894.67 Additiona | information: full citation , references , citings , index terms 
KB) 



16 Probabilistic wavelet synopses 
Minos Garofalakis, Phillip B. Gibbons 

March 2004 ACM Transactions on Database Systems (TODS), Volume 29 Issue 1 
Publisher: ACM Press 

Full text available: g| pdf(668.62 Additiona | information: full citation , abstract , references , index terms 
KB) 

Recent work has demonstrated the effectiveness of the wavelet decomposition in 
reducing large amounts of data to compact sets of wavelet coefficients (termed 
"wavelet synopses") that can be used to provide fast and reasonably accurate 
approximate query answers. A major shortcoming of these existing wavelet 
techniques is that the quality of the approximate answers they provide varies widely, 
even for identical queries on nearly identical values in distinct parts of the data. As a 
result, users ha ... 

Keywords: Wavelets, approximate query processing, data synopses, randomized 
rounding 



17 On applying hash filters to improving the execution of multi-join queries 
Ming-Syan Chen, Hui-I Hsiao, Philip S. Yu 

May 1997 The VLDB Journal — The International Journal on Very Large Data 

Bases, Volume 6 Issue 2 
Publisher: Springer-Verlag New York, Inc. 

Full text available: ^pdf(318.66 Addjtiona| information: full citation , abstract , index terms 
KB) 

In this paper, we explore an approach of interleaving a bushy execution tree with 
hash filters to improve the execution of multi-join queries. Similar to semi-joins in 
distributed query processing, hash filters can be applied to eliminate non-matching 
tuples from joining relations before the execution of a join, thus reducing the join 
cost. Note that hash filters built in different execution stages of a bushy tree can 
have different costs and effects. The effect of hash filters is evaluat ed fir ... 

Keywords: Bushy trees, Hash filters, Parallel query processing, Sort-merge joins 
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Physical database design is important for query performance in a shared-nothing 
parallel database system, in which data is horizontally partitioned among multiple 
independent nodes. We seek to automate the process of data partitioning. Given a 
workload of SQL statements, we seek to determine automatically how to partition 
the base data across multiple nodes to achieve overall optimal (or close to optimal) 
performance for that workload. Previous attempts use heuristic rules to make those 
decision ... 

19 On parallel execution of multiple pipelined hash joins Q 
Hui-I Hsiao, Ming-Syan Chen, Philip S. Yu 

May 1994 ACM SIGMOD Record , Proceedings of the 1994 ACM SIGMOD 

international conference on Management of data SIGMOD '94, Volume 
23 Issue 2 
Publisher: ACM Press 

Full text available: ffi P df(1.24 MB) Additional Information: full citation , abstract, references , citings, 

index terms 

In this paper we study parallel execution of multiple pipelined hash joins. 
Specifically, we deal with two issues, processor allocation and the use of hash filters, 
to improve parallel execution of hash joins. We first present a scheme to transform a 
bushy execution tree to an allocation tree, where each node denotes a pipeline. 
Then, processors are allocated to the nodes in the allocation tree based on the 
concept of synchronous execution time such that inner relations (i.e., hash tables) ... 

20 Implementing sorting in database systems Q 
Goetz Graefe 

September 2006 ACM Computing Surveys (CSUR), Volume 38 Issue 3 
Publisher: ACM Press 

Full text available: ^ pdf(518.63 Additiona , information: full citation , abstract , references , index terms 
KB) 

Most commercial database systems do (or should) exploit many sorting techniques 
that are publicly known, but not readily available in the research literature. These 
techniques improve both sort performance on modern computer systems and the 
ability to adapt gracefully to resource fluctuations in multiuser operations. This 
survey collects many of these techniques for easy reference by students, 
researchers, and product developers. It covers in-memory sorting, disk-based 
external sorting, and cons ... 

Keywords: Key normalization, asynchronous read-ahead, compression, dynamic 
memory resource allocation, forecasting, graceful degradation, index operations, key 
conditioning, nested iteration 
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Solutions Journal: March 99 

As we will explain later, data skew tends to be an issue more with ad hoc query ... The 
formula 1 /cardinality is fine for values which occur with close to ... 
wwwjdug.org/mernber/journal/mar99/improving_db2.html - 29k - 
Cached - Similar pages 

Solutions Journal: Tuning SQL with Statistics - August 99 

Frequency statistics are helpful at identifying data skew, which occurs when some 

column values appear more ... Estimate concatenated column cardinality ... 

www.idug.org/member/journal/aug99/feature1.html - 18k - Cached - Similar pages 

Inside the Oracle Optimizer, Part 2 

To minimize intermediate results, the optimizer attempts to estimate the cardinality of 
each result set during the parse phase of SQL execution. ... 
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2.3 Estimating the cardinality of distinct, values in a set from a data sample. The 
ability to estimate the cardinality of a set from a ... 
www.vldb.org/conf/2004/IND5P5.PDF - Similar pages 

[PDF] Estimation of Query-Result Distribution and its Application in — 
File Format: PDF/Adobe Acrobat - View as HTML 

sive for data with high cardinality. Hence, most commer-. cial database systems 
maintain some ... to estimate the cost formula most accurately, we need rea- ... 

www.vldb.org/conf/1996/P448.PDF - Similar pages 
f More results from www.vldb.org 1 

[PDF] Data reduction by partial preaggregation - Data Engineering, 2002 ... 
File Format: PDF/Adobe Acrobat 

Estimating output size. To incorporate partial preaggregation into query, optimization 
we must be able to estimate the cardinality, of the output. ... 
ieeexplore.ieee.org/iel5/7807/21451/00994787.pdf - Similar pages 

[pdf] Building large ROLAP data cubes in parallel - Database Engineering ... 
File Format: PDF/Adobe Acrobat 

sizes and data skew. For a fact table with 16 million rows ... we estimate val- ... The 
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File Format: PDF/Adobe Acrobat 

Also, the estimate of the. cardinality of the large skew value (shown in black) in the ... 
timately in estimating the subtask time for the composite hash ... 

doi.ieeecomputersociety.org/1 0.1 109/71 .2501 1 7 - Similar pages 
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File Format: PDF/Adobe Acrobat 
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Apparatus and method for estimating cardinality when data skew is ... 
The improved cardinality estimate may then be used to. ... Apparatus and method for 
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www.freshpatents.com/ 

Apparatus-and-method-for-estimating-cardinality-when-data-skew-is-present-dt20050616... 
- 20k - Cached - Similar pages 

Method and apparatus for predicting selectivity of database query ... 

The estimate of the size of the result set might be performed by the optimizer or some 

other facility in the database management system . Estimating the ... 
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Method-and-apparatus-for-predicting-selectivity-of-database-query-join-conditions-us... 
- 92k - Supplemental Result - Cached - Similar pages 
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Apparatus and method for estimating cardinality when data skew is ... 
Without a way to estimate cardinality in a manner that accounts for data skew, the 
computer industry will continue to suffer from inaccurate estimates of ... 
www.patentdebate.com/PATAPP/20050131914 - Supplemental Result - Similar pages 

[PDF] LEO - DB2's LEarning Optimizer 

File Format: PDF/Adobe Acrobat - View as HTML 

errors is the cardinality estimate on which costs depend. Cost estimates might be off 
by 10 ... columns in stored tables, for estimating the selectivity of ... 
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cardinality estimation of various relational operators. For example,, the textbook 
formula (which is also widely used in commercial sys- ... 
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skew and even if all partitions contain the same number of records, ... 
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